Managing Uncertainty within Value Function Approximation in Reinforcement Learning
نویسندگان
چکیده
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL). Most successful approaches in addressing this problem tend to use some uncertainty information about values estimated during learning. On another hand, scalability is known as being a lack of RL algorithms and value function approximation has become a major topic of research. Both problems arise in realworld applications, however few approaches allow approximating the value function while maintaining uncertainty information about estimates. Even fewer use this information in the purpose of addressing the exploration/exploitation dilemma. In this paper, we show how such an uncertainty information can be derived from a Kalman-based Temporal Differences (KTD) framework. An active learning scheme for a second-order value-iteration-like algorithm (named KTDQ) is proposed. We also suggest adaptations of several existing exploration/exploitation dilemma schemes. This is a first step towards global handling of continuous state and action spaces and exploration/exploitation dilemma.
منابع مشابه
Managing Uncertainty within KTD
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL). Most successful approaches in addressing this problem tend to use some uncertainty information about values estimated during learning. On another hand, scalability is known as being a lack of RL algorithms and value function approximation has become a major topic of research. Both problems ari...
متن کاملHierarchical Decision Making In Electricity Grid Management
The power grid is a complex and vital system that necessitates careful reliability management. Managing the grid is a difficult problem with multiple time scales of decision making and stochastic behavior due to renewable energy generations, variable demand and unplanned outages. Solving this problem in the face of uncertainty requires a new methodology with tractable algorithms. In this work, ...
متن کاملCount-Based Exploration in Feature Space for Reinforcement Learning
We introduce a new count-based optimistic exploration algorithm for reinforcement learning (RL) that is feasible in environments with highdimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited s...
متن کاملA Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning
A Markov Decision Process (MDP) is a natural framework for formulating sequential decision-making problems under uncertainty. In recent years, researchers have greatly advanced algorithms for learning and acting in MDPs. This article reviews such algorithms, beginning with well-known dynamic programming methods for solving MDPs such as policy iteration and value iteration, then describes approx...
متن کاملSparse Distributed Memories for On-Line Value-Based Reinforcement Learning
In this paper, we advocate the use of Sparse Distributed Memories (SDMs) for on-line, value-based reinforcement learning (RL). SDMs provide a linear, local function approximation scheme, designed to work when a very large/ high-dimensional input (address) space has to be mapped into a much smaller physical memory. We present an implementation of the SDM architecture for on-line, value-based RL ...
متن کامل